Landscape genomics reveals genetic signals of environmental adaptation of African wild eggplants

Abstract Crop wild relatives (CWR) provide a valuable resource for improving crops. They possess desirable traits that confer resilience to various environmental stresses. To fully utilize crop wild relatives in breeding and conservation programs, it is important to understand the genetic basis of their adaptation. Landscape genomics associates environments with genomic variation and allows for examining the genetic basis of adaptation. Our study examined the differences in allele frequency of 15,416 single nucleotide polymorphisms (SNPs) generated through genotyping by sequencing approach among 153 accessions of 15 wild eggplant relatives and two cultivated species from Africa, the principal hotspot of these wild relatives. We also explored the correlation between these variations and the bioclimatic and soil conditions at their collection sites, providing a comprehensive understanding of the genetic signals of environmental adaptation in African wild eggplant. Redundancy analysis (RDA) results showed that the environmental variation explained 6% while the geographical distances among the collection sites explained 15% of the genomic variation in the eggplant wild relative populations when controlling for population structure. Our findings indicate that even though environmental factors are not the main driver of selection in eggplant wild relatives, it is influential in shaping the genomic variation over time. The selected environmental variables and candidate SNPs effectively revealed grouping patterns according to the environmental characteristics of sampling sites. Using four genotype–environment association methods, we detected 396 candidate SNPs (2.5% of the initial SNPs) associated with eight environmental factors. Some of these SNPs signal genes involved in pathways that help adapt to environmental stresses such as drought, heat, cold, salinity, pests, and diseases. These candidate SNPs will be useful for marker‐assisted improvement and characterizing the germplasm of this crop for developing climate‐resilient eggplant varieties. The study provides a model for applying landscape genomics to other crops' wild relatives.


| INTRODUC TI ON
Crop wild relatives possess traits of interest for breeding climateresilient varieties because many are adapted to marginal environments (Kapazoglou et al., 2023).However, it is not often clear what specific adaptive traits they possess and to which abiotic stresses they are adapted (Rellstab et al., 2015), and linkage drag with undesirable traits makes it difficult to detect them (Chitwood-Brown et al., 2021;Huang et al., 2023).Landscape genomics is an emerging research discipline with a high potential to speed up the detection of valuable traits supporting breeding programs with information about a wide range of associations between specific genome locations and specific environmental factors.Landscape genomics integrates spatial statistics, population genomics, and landscape ecology to rapidly discover various adaptive markers associated with a wide range of environmental factors (Haupt & Schmid, 2022;Manel et al., 2010).It has been successfully applied to detect genes associated with the environmental adaptation of wild plants (Chang et al., 2022;Lasky et al., 2015;Lei et al., 2019;Morente-López et al., 2018).
Eggplants, including, brinjal eggplant (Solanum melongena L.) and African eggplants (S. aethiopicum L., S. anguivi Lam., and S. macrocarpon L.), are important vegetables globally and regionally, belonging to the Solanaceae family.Despite its significance in food production worldwide, eggplant has trailed in the development and use of genomic tools compared to other Solanaceae crops such as potato and tomato (Gramazio et al., 2018).This has changed over recent years due to the development of new genomic resources, including a highquality de novo assembled eggplant genome (Barchi et al., 2021;Gramazio et al., 2019).These genomic resources allow us to start screening eggplant genebank accessions for genes associated with environmental adaptation.Crop wild relatives are especially interesting to screen because they possess large untapped genetic diversity and traits of environmental adaptation that disappeared from eggplant varieties during domestication and breeding.
The African wild eggplant populations are particularly interesting because sub-Saharan Africa is a hotspot of wild relatives of all domesticated eggplants including brinjal eggplant (Aubriot et al., 2018;Syfert et al., 2016).African eggplant wild relatives have shown significant morphological variations and thrive in a myriad of ecological habitats spanning from the equatorial savanna to almost barren desert landscapes (Weese & Bohs, 2010).Most of the African wild relatives belong to either the primary or secondary gene pools of one or more of the domesticated eggplants depending on their phylogenetic relationship and success of crossing with eggplant (Knapp et al., 2013) and are amenable to interspecific hybridization with eggplant (Plazas et al., 2016;Rakha et al., 2020).These Solanum species are poorly studied for breeding purposes, and they are underrepresented in seed banks (Syfert et al., 2016).However, it is highly probable that each population has developed specific adaptations to suit their respective local environmental conditions, resulting in many variations.
So far, landscape genomics has not been widely applied to detect genes associated with adaptive traits in crop wild relatives.Traditionally, landscape genomics focuses on single species (Richardson et al., 2016).However, for crop wild relatives, a limited number of records are often available for individual species, and breeders are often interested in screening multiple species in the same crop gene pool for traits of interest (Engels & Thormann, 2020).Therefore, it is common that crop genomic studies cover gene pools with multiple species (Barchi et al., 2021;Lin et al., 2022;Tripodi et al., 2021).This means that by associating genetic signals of multiple crop wild relatives across the environmental gradient of a landscape, we can gain a broad insight into eco-evolutionary patterns across crop gene pools and identify various options for breeding.
In this study, we apply landscape genomics to screen African eggplant wild relatives for SNPs associated with environmental adaptation.Our objectives were to (i) evaluate the population structure of eggplant wild relatives from diverse environments, taxa, and geographies in West and Eastern Africa; (ii) estimate the contribution of environmental, population structure, and geographic factors in shaping the genomic variation across eggplant wild relatives gene pools; (iii) identify candidate SNPs and their association with the environmental factors.The genotype-environment association was also applied to predict the adaptive landscape; and (iv) investigate the potential role of genes associated with candidate SNPs in enabling local adaptation.

| Plant material
We genotyped 153 accessions of 17 eggplant species, including 15 wild species and two cultivated species (S. macrocarpon and S. aethiopicum) collected from wild or feral populations.Royal Botanical Garden, Kew, provided the taxonomic classification, and we further checked in World Flora Online (WFO, 2023).The collections comprised several species representing collections from different West and East African countries collected during the Global Crop Wild Relatives project (http:// www.cwrdi versi ty.org/ ; Dempewolf et al., 2014Dempewolf et al., , 2017;;Müller et al., 2021) and other initiatives.The accession collection points represent different Köppen climate zones (Figure 5; Table S3) (Beck et al., 2018).All accessions are available at the World Vegetable Center (WorldVeg) genebank (https:// geneb ank.world veg.org/ ).

| Genotyping
According to the manufacturer's instructions, we isolated the genomic DNA from fresh leaves of five seedlings per accession using the FavorPrep Plant Genomic DNA Extraction Mini Kit (FAVORGEN).
We then constructed the sequencing library following the approach of Elshire et al. (2011).Genomic DNA was quantified by Qubit and normalized to 100 ng in 96-well plates.We digested the DNA samples using the restriction enzyme ApeKI and ligated them with two adapters for sequencing, followed by the polymerase chain reaction to amplify the target DNA fragments to complete the sequencing library preparation.A service provider did sequencing with the Illumina HiseqX platform in a pair-end 150 bp run.
For the SNP calling, we followed mainly the manual of Stacks software (Catchen et al., 2013).In short, we filtered the raw reads by quality and demultiplexed using the process radtags program.We then mapped the retained reads to the eggplant reference genome (Eggplant_V4.1.fa)(Barchi et al., 2021) using the Burrows-Wheeler Aligner (BWA) version 0.7.17 (Li & Durbin, 2009).We sorted and indexed the reads using Samtools version 1.15.1 (Li et al., 2009), after which we performed the variant calling using the gstacks and population programs in Stacks software to obtain a raw VCF of 1,066,587 SNPs.We further filtered the SNPs and the accessions with less than 20% missing data and a Minor Allele Frequency (MAF) >0.05, and LD pruning with a threshold of 0.1 and a window size of 50 bp, giving the final high-quality SNP dataset comprising 15,146 SNPs used for the analysis.

| Environmental data
We selected climate and soil data from three open-source databases for our models (Table S2).We downloaded the grids for 19 bioclimatic variables, solar radiation, wind speed, and vapor pressure derived from WorlClim 2.1 (Fick & Hijmans, 2017) at a resolution of 2.5 min.The 19 bioclimatic variables were each downloaded as annual data averages between 1970 and 2000.We averaged the monthly solar radiation, wind, and vapor pressure rasters to obtain annual value rasters from this period.We complemented the climate data set with a set of climate-related variables downloaded from CHELSA (Climatologies at high resolution for the earth's land surface areas), a downscaled climate data set at a resolution of 30 arcsec (~1 km) globally starting from 1980 until 2018 (Brun et al., 2022).The variables included vapor pressure deficit, potential evapotranspiration, climate moisture, growing degree days, growing season length, growing season temperature, and growing season precipitation.Soil variables included nitrogen, soil organic carbon, organic carbon density, organic carbon stock, cation exchange capacity, pH, clay sand, and silt content.We downloaded the soil data from the SoilGrids database released in 2016 (https:// soilg rids.org/ ) through ISRIC-WDC Soils (Hengl et al., 2017) at 250-meter resolution and at a depth of 15-30 cm, approximately the depth at which the eggplant roots can grow.We aggregated the resolution of the soil dataset to match that of the climate data, ensuring they are consistent in both resolution and extent.We averaged the aggregated soil values using the resample and extent functions of the raster package in R (Hijmans, 2023).
For each accession, we extracted the data of the environmental variables with the extract function of the R raster package (Hijmans, 2023) using the GIS coordinates at sampling points to obtain a full data set of all the climate and soil variables.For the modeling, we selected the environmental variables based on variance inflation factors (VIFs).VIFs under five are considered low correlation (James et al., 2013).We selected a final set of eight climate and soil variables with very low correlation for downstream analysis (Table S1, Figure S5).Following the selection of environmental variables, we tested our hypothesis of isolation by the environment (IBE) versus isolation by distance (IBD) using the mantle test mantel.rtestfunction in the R package ade4 (Dray & Dufour, 2007).

| Population structure and differentiation analysis
We used the program STRUCTURE ver.2.3.5 (Pritchard et al., 2000) and snmf function in the R package LEA (version 1.4.0)(Frichot et al., 2014) to investigate the population structure.We assigned the 17 species as our populations in the STRUCTURE analysis.These species represented seven eggplant clades (the Melongena, Anguivi, Arundo, Coagulans, Giganteum, Acanthophora, and Aculeastrum clades) (Aubriot et al., 2016;Syfert et al., 2016; http:// www.solan aceae source.org).Considering this phylogenetic structure, we ran sub-populations varying from K = 2 to K = 10, each with 10 independent runs and a burn-in of 10,000 iterations for each run.To determine the optimal K, we generated Delta (Δ) K for each K with a web-based program, STRUCTURE HARVESTER ver0.6.94 (Earl & von Holdt, 2012).We then aligned and plotted Delta K-s against K-s with the CLUMPAK server (Kopelman et al., 2015).For comparison to the structure analysis, we also estimated the ancestry coefficients using the function snmf in the LEA R package.For snmf, we ran subpopulations from K = 1 to K = 10 with 10 repetitions for each run.The best K was chosen using the cross-entropy criterion of all the runs.
The admixture coefficients (Table S1) for the optimal K value for each individual were plotted as pie charts using the map function in the maps R package (Becker et al., 2022) onto the map showing the sampling site.We also constructed a dendrogram based on the SNP markers using the Unweighted Pair-Group Method with Arithmetic Averaging (UPGMA) in TASSEL v5.2.89 (Bradbury et al., 2007) and plotted using TreeViewer (Bianchini & Sánchez-Baracaldo, 2024).
We analyzed the genetic differentiation by computing F ST and AMOVA between genetic groups identified through population structure analysis.We computed F ST values and AMOVA using genet.

| Partitioning of the genomic variation and identification of candidate SNPs
We used four genome scan methods to make a complimentary selection of candidate SNPs among the different methods.
First, we used simple redundancy analysis (RDA) to associate the obtained SNPs with the selected environmental factors without controlling for population structure and geographical distances.
RDA is a multivariate method for assessing a linear relationship between two or more factors (Legendre & Legendre, 2012).RDA was performed with the rda function of the R package vegan (Oksanen et al., 2022).We carried out 5000 permutations to test the significance of explanatory variables with the R function anova.cca.
Second, we used partial RDA, which allows the partitioning of genomic variation into components explained by different factors.
To partition the genomic variation into these components, we conducted a full model RDA with the selected environmental factors, spatial autocorrelation, and population structure and then did a partial RDA conditioned on covariates to estimate the proportion of SNP variation explained by the factors that were included in the model.We did the partial RDA with the rda function and the number of permutations as the simple RDA explained above.To account for the effect of spatial autocorrelation on SNP variation in the partial RDA analysis, we applied distance-based Moran's eigenvector maps (dbMEMs) in RDA (Dray et al., 2006;Legendre & Legendre, 2012).
This involved first building a neighborhood connection network of 153 collection points.With this network, we constructed a spatial weighting matrix of inverse geographical distances (km −1 ) following the method by Forester et al. (2018).The spatial weighting matrix was then decomposed to generate dbMEMs.Subsequently, we performed forward selection with the forward.selfunction (Dray et al., 2022) to identify dbMEMs that associate significantly with spatial genetic structure.We then applied the selected dbMEMs in RDA to capture comprehensive spatial autocorrelation (Table S5).
To account for the effect of population structure on SNP variation in the partial RDA analysis, we used the ancestry coefficients estimated by the STRUCTURE program with the optimal K (K = 8) as covariates.In RDA, SNP outliers were defined as the SNPs having loadings along the first three RDA axis ±3 SDs from the mean for each axis following Capblancq et al. (2018).
In the preliminary analysis, we noticed that population structure clustering was largely related to eco-geographical habitats.To separate the connections of population structure to environment and geography, we calculated the proportion of population structure attributed to environmental factors and spatial autocorrelation.
We substituted the SNPs with the ancestry coefficient from the STRUCTURE analysis as our new response variable in the RDA models.Our new response variables in the RDA models were the ancestry coefficients from the STRUCTURE analysis with the optimal K. Third, we performed latent factor mixed model (LFFM), a univariate method to associate the obtained SNPs with the selected environmental factors.LFFM allows control for population structure using the R package lffm (Caye et al., 2019).In the LFMM analysis, we controlled the population structure using the optimal K, which we initially determined using the STRUCTURE program and q values computed.We considered SNPs with a false discovery rate (FDR) <0.05 to be candidate SNPs.
Fourth, we used PCAdapt as an outlier differentiation method.
The R package PCAdapt uses a PCA-based approach to simultaneously infer population structure and identify outlier loci related to this structure.In contrast to the three previous methods, it does not return associations between obtained SNPs and the selected environmental factors (Luu et al., 2016).With this different approach, we expected to capture other candidate SNPs not yet captured by the three gene-environment association methods explained above.We adjusted the p values using the Bonferroni method in the p.adjust function of the R stats package (R Development Core Team).After that, we applied an FDR ≤0.05 as the significance level for detecting the outlier loci.

| Prediction of adaptive landscapes with a selected set of candidate SNP markers
After identifying the candidate SNP markers and their associated environmental factors from all four methods mentioned, it is possible to predict the level of candidate SNP markers in a specific environment.The model shows the geographic patterns in environmental adaptation on a grid map of a so-called adaptive landscape (Capblancq & Forester, 2021).We carried out a simple RDA to model the adaptive landscape with the set of candidate SNPs-from the four methods explained above-and a set of environmental variables most strongly correlated with the putative adaptive variation.In this case, we used the candidate SNPs as the multivariate response in the simple RDA using the selected environmental variables as the explanatory variables.After that, we calculated an adaptive index for each geographic grid cell of the adaptive landscape based on the genotype-environment associations following the procedure outlined by Steane et al. (2014).The adaptive index provides an estimate of the adaptive similarity or difference of all the grid cells in the landscape as a function of the environmental predictor values of each grid cell (Capblancq & Forester, 2021).We geographically mapped the indices for RDA axes 1 and 2 using the R package ggplot2 (Wickham, 2016).Visualizing the adaptive landscape enabled us to observe the geographic distribution of the adaptive alleles across the population ranges of the crop wild relatives involved.A higher positive or negative adaptive index score is associated with changes in allele frequency of candidate SNPs across environmental gradients.The modeled landscape was limited to the geographic areas of 153 populations of eggplant wild relatives using the st_convex_hull function of the R package sf (Pebesma, 2018).We extended the convex hull at a distance equal to 1 using the st_buffer function.

| Gene annotation
We identified genes linked to the candidate SNPs using the Sol Genomics Network data file transfer protocol (FTP) database for the eggplant genome consortium version 4.1 (Barchi et al., 2021) (https:// www.solge nomics.net/ ftp/ genom es/ Solan um_ melon gena_ V4_ Pange nome/ Annot ation_ V4/ ).The candidate gene search and Gene Ontology (GO) terms were assigned using the Arabidopsis information resource (tair) (https:// www.arabi dopsis.org/ ) and Uniprot (https:// www.unipr ot.org/ ) databases.We characterized the genes and their functions, particularly those associated with abiotic stress and relevant to the environmental adaptation of the eggplant wild relatives.

| Population structure
The STRUCTURE harvester calculated an optimal number of eight groups (K = 8) (Figure S1).Five groups were evident (Figure 1; Figure S3), but optimal K = 8 from the STRUCTURE analysis could be due to unclarified groups arising from admixtures.The sNMF analysis also confirmed five groups (Figure S2).Therefore, we considered the ancestry coefficients in the clustering with K = 5 optimal and applied them in the genome scan analysis of RDA and LFFM.We also identified several admixed individuals from the hierarchical population structure plot at K = 5 for almost all the species (Figure 1a).From K = 2 to K = 10, the cultivated S. macrocarpon and its wild ancestor S. dasyphyllum and S. dasyanthum separated from the other 14 species (Figure S3).This group belongs to the Anguivi clade, and the accessions are mainly from the West African eggplant populations.
The dendrogram based on the SNPs displayed five genetic groups (Figure 2c), supporting the clustering in the population structure analysis.While the groups represent seven Solanum clades, they largely reflect the geographic regions where the populations were sampled (Figure 1b).Groups 1 and 2 comprise West African accessions and S. macrocarpon, S. dasyphyllum, and S. anomalum.Groups 3, 4, and 5 are mostly East African groups, and the species include S. incanum, S. cerasiferum, S. aethiopicum, S. setaceum, S. campylacanthum, and S. aculeatissimum.
The observed and expected heterozygosity averages were 0.01 and 0.07, respectively.The inbreeding coefficient within the groups ranged from.0.79 for Group 3 to 0.94 for Group 4 (Table 1).The genetic differentiation results showed a moderate level of genetic differentiation among all the groups.The values ranged from 0.50 (between Group 4 and the admixed groups) to 0.91 (between Group 1 and Group 3) (Figure S4).This indicates significant differentiation between the groups.The AMOVA results showed that the genetic differentiation between the groups and within groups accounted for 80.92% and 19.08 (p ≤ .01) of the total variation, respectively, indicating a high genetic diversity between the populations as opposed to within populations (Table S4).Nucleotide diversity (π) and Watterrson's theta (θ) were highest in Groups 2, 4, and 5, and the admixed group with a π range of 0.08-0.10 and θ range of 0.09-0.15.
The nucleotide diversity was lowest in Groups 1 and 3 at 0.04 and 0.03, respectively.The low nucleotide diversity is also supported by the low expected heterozygosity for these groups.All the groups showed a negative though differing Tajima's D values.Group 5 had the highest value (−0.29), while Groups 1 and 4 had the lowest value (−1.80).The negative Tajima's D values indicate selective sweeps in the populations of the wild eggplant species.

| Genomic environmental association analysis
Among the eight non-redundant environmental variables, we identified using VIF with a threshold of five, PDrM (precipitation of the driest quarter), MTWeQ (mean temperature of the wettest quarter), wind (wind speed), cmi_range (annual range of monthly climate moisture index), and pet_range (annual range of potential evapotranspiration) describe the precipitation and temperature patterns, while nitrogen (soil nitrogen content), silt (soil silt content), and cec (cation exchange capacity) described the soil properties of the sampling sites.The results of the Mantel test revealed a significant positive correlation (r = .083,p < .0004)with the spatial distances of our sampling points (Figure S8), implying that the further the geographic distances, the greater the environmental dissimilarities based on the selected environmental variables.

| Redundancy analysis models and the genomic variation partitioning
All our RDA models were significant (p ≤ .002and .000for sRDA and pRDA, respectively).The simple RDA showed genetic differentiation within countries of origin on the first axis (Figure 2a).The second axis showed genetic differentiation among the populations among the populations with accessions from Uganda, forming a central group between accessions from Nigeria, Ghana, Kenya, and Sudan.These results suggest a significant environmental effect on genetic differentiation.This observation is also consistent with the STRUCTURE analysis results that showed clustering primarily due to environmental characteristics.The highest biplot scores on the first RDA axis were for wind and cmi_range (−0.67, 0.55, respectively: Table S5).The highest scores on the second RDA axis were for the mean temperature of the wettest quarter (−0.75) and soil nitrogen (0.51).The simple RDA identified one hundred and thirty-nine candidate SNPs on the first three RDA axes, which explained 61.5% of the SNP variation.Thirty-nine candidate SNPs were detected on the first axis, while 59 and 44 candidate SNPs were detected on the second and third axes, respectively.Most of the SNPs were associated with the mean temperature of the wettest quarter, wind, and soil nitrogen (44, 37, and 27, respectively; Figure 2b; Table S6).Conditioning the RDA on population structure and geographical distance significantly reduced the effects of environmental variables compared to the simple RDA (Figure 3a; Table S6), indicating a high correlation between environmental factors and other factors (geographic distances and population structure).The highest biplot scores on the first RDA axis were for wind and cmi_range (−0.67, 0.55, respectively: Table S5).The highest scores on the second RDA axis were for the mean temperature of the wettest quarter (−0.75) and soil nitrogen (0.51).
The simple RDA identified one hundred and thirty-nine candidate SNPs on the first three RDA axes, which explained 61.5% of the SNP variation.Thirty-nine candidate SNPs were detected on the The group formations with sorted q values of the ancestry coefficient matrix.The bar plot represents individuals; (b) a map of the sampling areas in Western and Eastern Africa with pie charts showing the admixture proportions from structure analysis (optimal K = 5); (c) a dendrogram with colors corresponding to the five genetic groups formed in the bar plots of the population structure analysis.The groups comprise species from different collection populations and gene pools.The main species for every group included Groups 1-S.anomalum, S. incanum, and S. aethiopicum; Group 2-S.macrocarpon, S. dasyphyllum, S. anomalum, and S. incanum; Group 3-S.cerasiferum and S. anguivi; Group 4-S.anguivi and S. anomalum; and Group 5-S.campylacanthum.Groups 1 represents the Coagulans, Acanthophora, and Arundo clades; Groups 2 and 5 represent the Anguivi and Giganteum clades, while Groups 3 and 4 represent the Melongena and Aculeastrum clades.Admixtures were observed mainly in Groups 1 and 3 for accessions of S. incanum, S. cerasiferum, S. coagulans, and S. nigriviolaceum species.and soil nitrogen (44, 37, and 27, respectively; Figure 2b; Table S6).
The pRDA further decomposed the contribution of environmental, geographic distance, and population structure in explaining the inter-population genetic variation.All the variables jointly explained 55.0% of the genetic variation (Table 2).The collinear portion of the environment, geographic distances, and population structure accounted for most (28.0%) of the total explained variation.Independently, the environment and population structure accounted for 6.1% and 5.4% of the total genetic variation, respectively, while geographic distances explained the largest fraction of the variation at 15.0%.suggested that isolation by environment explains the population structure of the wild eggplants.This also aligns with the results of the Mantel test that showed environmental differences with increased spatial distances.

| Candidate SNPs
In summary, we detected 443 candidate SNPs using the four methods (Figure 4; Table S10).S6).Despite a minimal overlap of SNPs identified by the method, 42 SNPs were identified by at least two methods.None was identified by more than three methods (Figure 5).
Approximately 40 of the 393 unique candidate SNPs (Table S9) were directly associated with candidate genes contributing to adaptation to different abiotic stressors such as heat, cold, drought, and salinity in the Uniprot and TAIR databases.For example, SNP Chr8_71590031 (Table 3), located in Chromosome 8, is associated with nitrogen content in the soil and is linked to a protein AKT1 (characterized in Arabidopsis thaliana).This protein regulates stomatal closure and root elongation and responds to water deprivation and salt stress (Nieves-Cordones et al., 2012;Pyo et al., 2010).Sixty-five of the detected candidate SNPs were mapped within genes for proteins of unknown functions.
We illustrated the allele frequency distribution of the SNP Chr8_71590031 selected based on its link to a protein with an obvious function in adaptation to drought and salt stress (Figure 5).We observed a clear difference in the allele frequency distribution along the environmental gradient associated with different climates and soil nitrogen content.The allele distribution patterns are distinct between Sudan's dry, semi-arid, hot regions, where the minor alleles are observed, and the tropical monsoon and dry winter savannas of Nigeria and Ghana, where the major alleles dominate.

| Adaptive landscape
The modeled environmental space enriched with candidate SNPs is illustrated in Figure 6.The RDA performed purely with the outlier SNPs revealed a separation of the samples in the second axis according to the environmental characteristics of the sampling sites and according to the groups detected in the population structure analysis.
This observation further strengthens the higher contribution of the environment in explaining the population structure.A PCA of the selected environmental variables could also detect the groups identified by population structure analysis (Figure 6b).The clustering of the groups in the environmental PCA largely mirrored the collection regions.Groups 1 and 2 consist mainly of samples from West Africa, separated from the rest of the groups with East African origin samples.This indicates that the set of environmental variables and the selected SNPs can be effectively used to characterize the environments and adaptive genetic diversity of the African eggplant wild relatives.The allele frequencies along the first RDA axis (32.5%) were primarily associated with soil nitrogen content, mean temperature of the wettest quarter, and wind.The second RDA axis (22.8%), on the other hand, was mainly associated with the mean temperature of the wettest quarter soil cation exchange capacity and Annual range of monthly climate moisture index (cmi_range) (Figure 6a).
When our adaptive landscape model was projected geographically, the first RDA axis contrasted the West African (Ghana) and East African (Kenya) from the rest of the regions.The East African (Kenyan) and West African (Ghanaian) populations experience relatively higher soil nitrogen content, the main driver in the first RDA axis.Our adaptive landscape on the second RDA axis also contrasted East Africa (Sudan), which is experiencing arid and semi-arid climates characterized by higher temperatures compared to the tropical and temperate climates of West Africa and East Africa.
To estimate the adaptive potential of the species, we calculated the adaptive index as described in Methods.The species average of the adaptive scores are shown in

| The genetic variation across eggplant crop wild relatives significantly corresponds to environmental adaptation
Our findings demonstrate a significant correlation between genetic variation and environmental differences in the wild eggplant species, even though the environment does not account for the highest proportion of explainable genetic variation.Overall, genetic differentiation was found at high levels among the groups representing accessions from different environments.The results of the UPGMA tree and population structure analysis also confirm this.
These findings provide insights into to which extent populations of wild eggplant relatives have been shaped by environmental adaptation.Even though the speciation is beyond the scope of this study, our results might also highlight the environment's role in speciation  in the eggplant gene pools, which is in line with previous reports on eggplant species divergence (Weese & Bohs, 2010).
The RDA analysis showed that climate and soil are important environmental drivers of genomic variations in our materials.Including a range of environmental data in landscape genomics studies will help effectively evaluate the complex factors contributing to local adaptation in nature (Dauphin et al., 2023).So far, few landscape genomic studies have integrated soil factors such as pH and nitrogen in the analysis.Even though the climate may influence soil development (Joswig et al., 2022), our findings highlight the importance of including soil factors in landscape genomics as drivers of environmental adaptation.This is also in line with other studies.For example, a study on the adaptation of two grasshopper species in the Australian Alps revealed significant GEAs with soil pH, among other factors (Yadav et al., 2021).Arabidopsis demes were also found to be locally adapted in their native habitat to soils with moderately high carbonate (Terés et al., 2019).
Our study observed a significant effect of the geographic distances between the sampling sites compared to the environmental factors.This aligns with other studies for the plant species Boechera stricta genotypes from populations experiencing different climatic conditions (Chang et al., 2022;Cruz-Nicolás et al., 2020;Malanson et al., 2017).In other studies, geographical distance and environmental factors explained comparable proportions of the genetic variation (Gibson & Moyle, 2020;Lasky et al., 2012Lasky et al., , 2015)).Thus, the proportional effects of environment and geography depend heavily on the species, its environment, and the species' history in that environment.
We found a relatively low proportion of the explainable SNP variation due to environmental (6.0%).This appears to be common in other landscape studies that report a low percentage of variation explained by the environment (Dauphin et al., 2023).Several factors we did not address in our study may be responsible for this phenomenon.Firstly, despite incorporating several environmental variables in this study, other evolutionary forces, such as pests and diseases unrelated to the factors used in our study, may also play a role.Detecting several SNP in genes conferring immunity and resistance against insects in this study might support this claim.Also, while our environmental data provide information at a larger geographic scale, they may not necessarily capture all local environmental heterogeneity.Another reason for low explainable SNP variation may arise because RDA and LFMM can only model linear associations between the environment/space and SNPs and, therefore, will fail to capture non-linear associations that might exist (Borcard et al., 2011).Nevertheless, RDA models remain effective because they can effectively account for covariation among environmental variables and genetic markers, as is often the case in nature (Capblancq & Forester, 2021).selective gradients related to soil quality, temperature, and water availability.We also get the impression that the covariation between soil and climate is oversimplified, so it would be important to investigate the effect of the two in driving selection.Identifying species with interesting traits is paramount to effectively using CWR diversity.The description of the adaptive landscape enables the identification of putatively adapted genotypes returning high adaptive scores.In this study, S. anomalum, S. macrocarpon, S. incanum, and S. coagulans showed the highest adaptive scores of 1.02, 1.01, 0.42, and 0.42, respectively.The high adaptive score for these species, especially S. incanum, affirms records that they are known to grow in desert conditions and, especially S. incanum, is considered a powerful source of phenolics and tolerant to abiotic stress such as drought (Gramazio et al., 2017;Knapp et al., 2013;Meyer et al., 2012;Plazas et al., 2022).
Our population structure analysis revealed admixture in all the species in our study, suggesting gene flow and interspecific hybridization occur regularly among eggplant wild relatives.The interfertile nature of the eggplant species can explain the admixtures and clustering of the species from different Solanum clades.Interspecific hybridization between eggplant species has been shown mostly in crossing experiments (Bukenya & Carasco, 1995;Plazas et al., 2016) and between domesticated eggplants and their wild ancestors (Meyer et al., 2012).However, we have not found reports so far on the interspecific hybridization among eggplant wild relatives in their natural environment.This provides evolutionary insights, has implications for in situ conservation, and requires further investigation to understand these natural patterns of interspecific hybridization.
These findings also confirm the relevance of carrying out landscape genomics for crop wild relatives at the gene poolgene pool level rather than individual species, as the gene pools of eggplant and many other crops, such as those of pumpkin and amaranth have significant levels of interspecific fertility (Lin et al., 2022;Sanjur et al., 2002).
One limitation of our study was the seemingly biased sampling of species in the different regions involved in our study.This is mainly due to the rare record of some of the species.Furthermore, as much as crop wild relatives are known to thrive in diverse marginal environments, some species may be limited to particular environments (Renzi et al., 2022).Expanding this study with new collections will provide further information about environmental adaptation.
However, our current study intends to fully utilize the extensive untapped genetic diversity in eggplant genebank material by incorporating rare species from the same gene pools as the more common species.Many accessions of eggplant and related species from primary, secondary, and tertiary gene pools have not been thoroughly assessed despite possessing diverse traits beneficial for eggplant breeding, including traits adaptive to climate change (Gramazio et al., 2023).Consequently, our study offers valuable insights that can guide breeders and conservation experts on important adaptive characteristics that might be overlooked if rare species are disregarded.
Because the adaptive landscape is associated with environmental factors, conservation managers may also apply our findings to effectively manage populations experiencing different ecological conditions and facilitate future studies on eggplant populations' response to future environments.Several tools and methodologies have been developed to predict changes in allele frequencies under climate change to support in situ conservation and germplasm collecting (Dauphin et al., 2023;Rellstab et al., 2016).What our study shows is that this type of climate assessment can be extended to look at evolutionary patterns at a larger genepool or clade rather than assessments for single taxa.This allows a broader view on the conservation of genomic variation and also allows the inclusion of rare taxa that have unique genomic variation compared to other taxa.

| Landscape genomics detected candidate SNP markers for both climate and soil factors
Our outlier detection approaches identified 396 candidate SNP markers-less than one percent of the total number of SNPs of the initial selection.Other studies have also observed percentages similar to ours (Chang et al., 2022;Mdladla et al., 2018).We attribute this to several reasons.Likely, more loci are under selection, but the stringency applied in our analysis did not allow their detection.The control for population structure in GEA methods can sometimes be overly conservative (Forester et al., 2018).Therefore, we applied multiple methods in our study to capture more candidate SNPs.
We attribute their complementarity to the four methods' different assumptions, strengths, and limitations.At the same time, the conservative selection reduces the number of false positives, and a large proportion of the final set of candidate SNPs could be associated with candidate adaptive genes that have been reported.
Secondly, adaptation occurs mostly due to minor and linked allele frequency modifications across multiple genetic loci that show weak selection and may not be detected through the genome scan (Stetter et al., 2018).Finally, the GBS method we applied in generating our SNPs only allows the examination of low coverage of the genomes, thereby limiting the total number of SNPs used in our analysis and the ability to detect a considerable number of adaptive candidate genes.Future whole-genome studies will allow the detection of more candidate SNP markers.
The three hundred and ninety-three outliers identified by our detection methods revealed a pattern of environmental separation in our populations (Figure 6a).This finding suggests that these markers are highly linked to selection at the natural sites.Among    (Wang et al., 2015).
We also identified genes regulating plant growth, reproductive development, and transcription.Among the genes we detected were also genes involved in flower development.For instance, protein FLXL1 is part of FRIGIDA complex that acts as a molecular switch underlying the activation of FLOWERING LOCUS (FLC) for flowering time control in A. thaliana.Protein ASHH2, a histone methyltransferase, also positively regulates FLC to prevent early flowering transition (Zhao et al., 2005).Double mutant of the CP2 gene in A. thaliana has also been shown to skip vegetative development and flower upon germination (Mateo-Bonmatí et al., 2005).Also, Cpn60β4 mutant plants showed early seed germination and early flowering phenotype in A. thaliana (Tiwari & Grover, 2019).The timing of flowering is an important adaptive trait to prevent time to enable reproductive success.From the adaptation perspective, suppressing or accelerating the transition from vegetative to reproductive development in plants shows clear fitness consequences in harsh environments (Austen et al., 2017;Li et al., 2010;Shafiq et al., 2014).Flowering too early or too late can increase the chances of floral damage and the risk of incomplete seed development due to adverse weather conditions (Inouye, 2008).Nitrogen is a dominant macronutrient affecting flowering time and general plant growth (Zhang et al., 2022); therefore, it is not surprising that all the genes regulating flowering time in this study were associated with soil nitrogen content.
Protein AKT1 is also involved in root development.This protein, correlated to soil nitrogen content, shows a directional change in allele frequency to soil nitrogen content.We have also demonstrated in this study how minor allele frequency was observed predominantly in the Sudan population experiencing hot and dry climates where nitrogen availability is limited (Peri et al., 2019).Under nitrogen deficiency, roots show root-related differential expression of proteins contributing to enhanced root growth (Qin et al., 2019).
These findings offer valuable insights for investigating eggplant root response to nitrogen deficiency and the development of cultivars with high nitrogen use efficiency through genetic improvements.
The multiple associations of soil nitrogen content with candidate SNPs from gene functions related to root nodulation and bacterial and fungal infections could also suggest that soil microbiomes have an important role in the environmental adaptation of eggplant crop wild relatives.Recent research is starting to unravel the role of these microbiomes in plant adaptation to environmental stress (ALsurhanee et al., 2021;Pasbani et al., 2020).Including microbiome diversity in landscape genomics could provide further insights into the environmental adaptation of plant species.
Other promising candidate genes include Zinc transporter 5 (ZIP5), involved in Zinc ion transport (Lee et al., 2010); NAC017 protein, a transcription factor key in regulating mitochondrial proteotoxic stress responses in plants (Kacprzak et al., 2020); CYP714A1 protein involved in the inactivation of gibberellin intermediates (Zhang et al., 2011); protein ATRX involved in the maintenance of the rDNA and pollen development (Duc et al., 2017); TIC110 protein facilitating plastid pre-protein translocation (Yuan et al., 2021) and CIA1 protein essential component of the cytosolic iron-sulfur (Fe-S) protein assembly (CIA) machinery (Luo et al., 2012).A few other candidates we detected have unknown functions would make good candidates for further investigations.
While our study detected just a few candidate genes conferring environmental adaptation, we imagine these genes could be linked to other genes in gene clusters that drive traits involved in environmental adaptation.Therefore, these genes can be useful in gene co-expression network analysis to map other genes that could contribute to adaptation traits.This is the first study identifying the candidate SNPs to detect adaptive genes in eggplant wild relatives.Our set of candidate SNP markers provides a new genomic tool for eggplant breeding, and our study provides an example of how landscape genomics can be applied to crop wild relatives.Our analysis thus provides a toolbox for breeders and researchers to establish new phenotyping experiments to test specific relations between genes and the environment.
Associating the candidate SNP markers to environmental factors is a primary step in uncovering the adaptive process.However, these correlations do not necessarily confirm the occurrence of environmental adaptation in nature.Therefore, the next step is to validate these genes with functional analysis in transcriptome gene expression analysis.

| CON CLUS IONS
Overall, this study has successfully analyzed the association between the environmental factors and the genetic markers to determine the effect of the environmental factors on the explainable genetic variation.Even though climate influences soil formation, we observed that soil factors play a prominent role as key explanatory aspects of genetic variation.The GEA and outliers tests identified genomic regions that might contribute to local adaptation in eggplant wild relatives.
while 59 and 44 candidate SNPs were detected on the second and third axes, respectively.Most of the SNPs were associated with the mean temperature of the wettest quarter, wind, Since genetic clusters largely corresponded to geographical areas, we partitioned the population structure to determine the proportions explained by environmental factors and geographical distances between the sampling points.Collinear factors of the environment and geographic distances significantly explained 35.3% of the total explainable population structure.The environmental factors alone accounted for 25.9% of the population structure, while geographical distances only accounted for 7.8%.These results F I G U R E 2 Biplot of simple RDA.Blue vectors indicate the direction and values of the environmental variables.(a) Colors correspond to the individual accession sampling sites by country.TZA: Tanzania; UGA: Uganda; KEN: Kenya; SDN: Sudan; NGA: Nigeria; GHA: Ghana; (b) Outlier SNPs colors correspond to the environmental variable with the strongest association.PDrM: Precipitation of the driest quarter; MTWeQ: Mean temperature of the wettest quarter; wind: wind speed; cmi_range: Annual range of monthly climate moisture index; pet_ range: Annual range of potential evapotranspiration; nitrogen: Soil nitrogen content; silt: Soil silt content; cec: Cation exchange capacity.
Fifty SNP among the total detections were duplicated among the selection methods.The candidate SNPs were distributed across all 12 chromosomes, as shown in the Manhattan plots (Figures S6 and S7).Most SNPs were associated with soil nitrogen content (104), mean temperature of the wettest quarter (64), and wind (63).The simple RDA, partial RDA, LFMM, and PCAdapt identified 139, 127, 63, and 114 candidate SNPs, respectively (Figure 5; Table nigriviolecuem and S. mauense showed the most negative adaptive scores on the first RDA axis, mainly related to an environment with low temperatures in the wettest quarter, high precipitations in the driest month, and high soil nitrogen contents.Other species, including S. aculeastrum, S. aculeatissimum, and S. arundo, sampled from similar environments, generally showed low adaptive scores on the first RDA axis.S. anguivi showed a high range of adaptive scores in the first due to the environmental differences of their sampling sites.This shows the environmental influence even within species (TableS7).On the second RDA axis, S. mauense, S. aculeastrum, and S. nigriviolaceum showed the highest positive adaptive scores compared to other species, contrasting with the observation in the first axis.Solanum coagulans and S. incanum showed the most negative adaptive scores on the second RDA axis, mainly related to an environment with high soil nitrogen content, highest temperatures in the wettest quarter, lowest precipitation in the driest month, high potential evapotranspiration ranges and soils with the highest cation exchange capacity.F I G U R E 3 Biplot of partial RDA conditioned on geographical distances and population structure.Blue vectors indicate the direction and values of the environmental variables.(a) The individual accessions are highlighted in color based on the origin.TZA: Tanzania; UGA: Uganda; KEN: Kenya; SDN: Sudan; NGA: Nigeria; GHA: Ghana; (b) Outlier SNPs are highlighted in color based on the environmental variable with the strongest correlation.PDrM: Precipitation of the driest quarter; MTWeQ: Mean temperature of the wettest quarter; wind: wind speed; cmi_range: Annual range of monthly climate moisture index; pet_range: Annual range of potential evapotranspiration; nitrogen: Soil nitrogen content; silt: Soil silt content; cec: Cation exchange capacity.
Env = Environmental variables; Spatial = Geographical distances (spatial autocorrelations); Struct.= Population structure.*p ≤ .05,**p ≤ .01,***p ≤ .001TA B L E 2 The contribution of the environmental variables (climatic and soil), population structure, and geographical distances to the genomic variation and observed genetic structure in the partial RDA models.F I G U R E 4 The sampling points and allele frequency distribution of a candidate SNP Chr8_71590031 detected by partial RDA linked to protein AKT1 expressed in response to water deprivation and salt stress and regulates potassium ion transport and stomatal closure.The colors represent the Köppen climate classification (Detailed description in Table S8).The West African sampling climates include tropical monsoons and tropical savannas with dry winters.East African environments consist of Sudan's dry, semi-arid hot climates (BSh) and Kenyan and Ugandan tropical and temperate climates (Af, As, Aw, BSh, and Cfb).The first letters in the climate classification acronyms represent Atropical climate; B-Arid climate; C-Temperate climate; D-cold continental climate; and E-Polar climate.
findings show that the eggplant's West and East African populations exhibited distinctive genetic responses due to climatic and soil factors.The first adaptive component was associated with soil nitrogen content and the mean temperature in the wettest quarter, while the second component was associated with the mean temperature of the wettest quarter, climate moisture index, and cation exchange capacity of the soil.This component contrasts the West African sampling areas with the East African sampling areas.These results suggest that the populations respond to Adaptive landscape with (a) the adaptively enriched genetic space showing the association between regional sampling locations and environmental drivers of adaptation; (b) Principal component analysis of the environmental diversity according to the sampling regions.The colors in a and b represent the groups detected in the population structure analysis.(c) the spatial projection of the adaptability across the study areas.PDrM: Precipitation of the driest quarter; MTWeQ: Mean temperature of the wettest quarter; wind: wind speed; cmi_range: Annual range of monthly climate moisture index; pet_range: Annual range of potential evapotranspiration; nitrogen: Soil nitrogen content; silt: Soil silt content; cec: Soil cation exchange capacity.
Population genetic parameters for African eggplant wild relatives groups based on SNPs.Standard deviations for π, θ, and Tajima's D are provided in the brackets.Abbreviations: F IS , inbreeding coefficient; H E , expected heterozygosity; H O , observed heterozygosity; N, number of individuals; θ, Watterson's Theta; π, Nucleotide diversity (pi).
Table S8.On the first RDA Axis, Solanum anomalum and S. macrocarpon showed, on average, the highest positive adaptive scores, mainly related to the West African environments with high climate moisture index, low wind speeds, and soils with low cation exchange capacity and silt content.Solanum TA B L E 1 showing the number of significant SNPs detected by four genome scan methods.Forty-two candidate SNPs detected by more than one selection test, the associated environmental variables, gene designation, and the descriptions of the gene functions.
(Hoban et al., 2016)we considered the 42 retained by more than one detection method to constitute a strong signal(Hoban et al., 2016), some of which are located in genes associated with plant adaptation and have been characterized in other plant species, including Arabidopsis thaliana, Solanum lycopersicum, Solanum demissum, and Capsicum annuum.These genes may be prioritized for further func-